Nguyen A, Yosinski J, Clune J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 427-436.
1. Overview
1.1. Motivation
A recent study revealed that changing an image in a way imperceptible to humans can cause a DNN to label the image as something else entirely.
In this paper
- show that easy to produce images that are urecognizable to humans, but DNN 99.99% believe it is a recognizable obj
- Evolution Algorithm
directly (row1 ) and indirectly (row 2)
- Gradient Ascent
1.2. Procedure
1.3. Dataset
- MNIST
- ImageNet
1.4. Network
- AlexNet
- LeNet
- CaffeNet
1.5. Discussion
the are a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class
Application. security camera (face, voice), search engine rankings (image’s background), driverless car (generate fooling images)
2. Evolution
2.1. Directly Encoding
- unrecognizable to human.
- uniform random initialize each pixel within [0, 255]
- each pixel has 10% chosen to be mutated, rate decay every 1000 generation
- polynomial mutation operator with a fixed mutation strength of 15 to mutate chosen pixel
2.2. Indirectly Encoding
- recognizable to human.
- producing image contain compressible patterns (symmetry and repetition)
- based on Compositional Pattern-Producing Network (CPPN). take pixel as input, and output a new pixel
2.3. Gradient Ascent
- maximize the softmax output for classes via gradient ascent to find image
- employ L2-regularization to produce images with some recognizable features of classes (dog face, fox ears)
3. Experiments
3.1. Directly Encoding on ImageNet
Less successful at producing high-confidence images on large dataset compare with MNIST. (larger dataset→ less overfit→ more difficult to fool)
3.2. Indirectly Encoding on ImageNet
More successful.
Similar images for closely related categories.
3.3. Generalization
3.3.1. Same Architecture & Different Initialization
- many images fooling A also fool B
- still some images different
3.3.2. Different Architecture
- many images generalize across DNN architecture
3.4. Train Network to Recognize Fooling Images
- (MNIST) similar performance to without fooling images
- (ImageNet) on the contrary